#Import statements
import pandas as pd
import matplotlib.pyplot as plt
import folium
import plotly_express as px
from plotly.subplots import make_subplots
import numpy as npFemale Employment
Pavana Atawale
August 4, 2023
Introduction
I decided to focus on the topic of discrimination on the basis of sex, specifically in the professional sector. As a woman who is involved in the Computer Science field, I am extremely aware of the discrepancies between how my male peers (in academia or industry) are treated compared to myself and my female peers. I have heard many horror stories from other women in my life about their professional experiences being hampered by their gender. I wondered if this was a general trend, or only in the CS field. I decided to use this project as a way for me to explore not only the presence of this discrimination, but also the possible causes of these discrepancies in the way women are treated at work.
I am aiming to answer the following question: What factors play a role in the differences in female and male employment?
This project is built upon the project I worked on for a previous class I took at UCLA (DH 101). For that project, we used the same dataset, but focused on a broad range of questions relating to this topic. For this project, I chose to focus more specifically on the question that most interested me.
In my previous class, we were inspired by the dataset we were assigned, specifically the entrepreneurship section. However, as we progressed, I found myself becoming more fascinated with the broader issue of gender-based discrimination in employment. In countries throughout the world, employment is affected by a range of factors. Throughout this project, I attempted to uncover some of the factors that influence female employment.
Methods
For my project, I utilized data from The World Bank’s Gender Data Portal which presents gender statistics, primarily from the United Nations and UNESCO. Its aim is to enhance comprehension of gender data and enable analyses that can inform policy decisions.
The dataset relies upon census data from member countries, surveys from other organizations, and existing research done by the United Nations. To streamline and standardize their data collection methods, the World Bank works with international organizations like the United Nations to adhere to strict standards of measurement.
The dataset contains various thematic indicators, each having entries for a variety of countries across the world. Each country also has entries for different years, ranging from 1970 to 2022. The indicators are split into different categories, such as education, leadership, etc. For this project, I decided to focus on the topics of employment and health.
My process for this project was as follows: * I imported, cleaned, and sorted through the datasets. * I explored the different indicators, countries, and years. * I did so by printing all the different indicators, and filtering through them * Then, I identified important indicators that could answer my question * I created visualizations to illustrate the connections between indicators, countries, and years. * I analyzed the visualizations, and used them to answer my question
I also found a GeoJSON file from the World Bank. I did have to edit that file separately, in order to make the names on the JSON file match the names in my datasets. This was done manually on my computer, before I uploaded it to be used for the mapping here.
Results
Data Exploration
#Import datasets from github repository
from zipfile import ZipFile
zip_file = ZipFile('Files/datasets.zip')
employ_df = pd.read_csv(zip_file.open('Employment.csv'))
employ_df = employ_df[['Indicator Name','Country Name', 'Year', 'Value']].copy()
health_df = pd.read_csv(zip_file.open('Health.csv'))
health_df = health_df[['Indicator Name','Country Name', 'Year', 'Value']].copy()
employ_ind = employ_df['Indicator Name'].unique()
health_ind = health_df['Indicator Name'].unique()
geo = "https://raw.githubusercontent.com/patawale/DH140FinalProject/main/WorldBank.geojson"Data Analysis
Female and Male Employment in Agriculture, Industry, and Services
“How does female and male employment differ based on business sector?”
Whether or not a woman is employed is not the only factor I am considering when looking at the differences in female and male employment. The industry in which a woman is employed is also important. If one with no prior knowledge of this topic were to consider the question above, they might answer that there is no difference, and women and men are employed in all sectors. However, based on my previous experience that might not be entirely true. In the following graph, I explore female and male employment in three different business sectors (agriculture, industry, and service), based on income status.
#List of 'Country Name' values that involve groups of countries in different income classes
incomes = ['Low income', 'Lower middle income', 'MIC', 'Upper middle income', 'High income']
#Filter dataset to only include relevant data
df = employ_df[employ_df['Indicator Name'].str.contains('Employment in ')]
df = df[~df['Indicator Name'].str.contains('total employment')]
df = df[df['Country Name'].isin(incomes)]
df = df.replace("MIC", "Middle income")
#Create and display a bar graph using plotly
fig = px.bar(df, x="Country Name", y="Value",
color='Indicator Name', hover_name='Indicator Name', barmode='group',
animation_frame="Year",
labels={
"Country Name": "Income level",
"Indicator Name": "Industry and Gender Descriptor",
"Value": "% of Employment"
},)
fig.update_layout(legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1
))
fig.show()This visualization contrasts the percentage of men and women employed in three different sectors of business: agriculture, industry, and services. The data was collected from different countries around the world, grouped by income status. From this graph, it can be seen that in every income group, the percentage of men in each business sector surpasses the percentage of women in the same sector. Thus, in total, men are present in business more than women. More specifically, we can see patterns forming, based on the women in different income groups. In higher-income groups, women are most likely to be working in the service industry. However, in lower-income groups, women are most likely to be working in the agriculture industry. Thus, we can see that income influences the type of employment that women achieve.
Overall, we can clearly see that women, regardless of income class, are more likely to be working in the service industry, than men in the same income class. On the other hand, regardless of income, men are more likely to be working in industry. Thus, regardless of income, there are still differences in the business sector that men and women are employed in.
Note: to clarify, this data was collected by grouping countries into income levels (low income to high income), then reporting the compiled data from each of these countries. It does not account for income disparities within a country.
Effects of Maternity Leave on Female Labor Force
The first major issue I can think of that might have an impact on the differences in female and male employment is maternity leave. Whether or not a woman is supported during her maternity leave would have a notable effect on her willingness and ability to rejoin the labor force. In the following graph, I explore the effects of the length of maternity leave on female participation in the labor force.
# Filter dataset to only include relevant data
df1 = employ_df[employ_df['Indicator Name'] == 'Length of paid maternity leave (calendar days)']
df1 = df1[df1['Year'] == 2022]
df1 = df1[['Country Name', 'Value']].copy()
#print(df1.sort_values(by=['Value']))# Filter dataset to only include relevant data
# Note: I had to do a slightly more complicated than regular way of filtering,
# because I needed to get the latest data from each country.
indices = []
prev_country = ""
for index, row in employ_df.iterrows():
indic = row['Indicator Name']
country = row['Country Name']
if(indic == 'Labor force participation rate, female (% of female population ages 15+) (national estimate)'):
if (country != prev_country):
indices.append(index)
prev_country = country
df2 = employ_df.iloc[indices]
df2 = df2[['Country Name', 'Value']].copy()
#print(df)
#print(df2.sort_values(by=['Value']).to_string())#Create and display both maps
def addTooltip(m):
style_function = lambda x: {'fillColor': '#ffffff',
'color':'#000000',
'fillOpacity': 0.1,
'weight': 0.1}
highlight_function = lambda x: {'fillColor': '#000000',
'color':'#000000',
'fillOpacity': 0.50,
'weight': 0.1}
NIL = folium.features.GeoJson(
geo,
style_function=style_function,
control=False,
highlight_function=highlight_function,
tooltip=folium.features.GeoJsonTooltip(
fields=['NAME_EN','INCOME_GRP'], # use fields from the json file
aliases=['Country: ','Income: '],
style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;")
)
)
m.add_child(NIL)
m.keep_in_front(NIL)
folium.LayerControl().add_to(m)
m = folium.Map(location=[40,0], zoom_start=2)
folium.Choropleth(
geo_data=geo,
data=df1,
columns=["Country Name", "Value"],
key_on="feature.properties.NAME_EN",
legend_name="There is paid parental leave (1=yes; 0=no)",
fill_color="Reds",
).add_to(m)
addTooltip(m)
display(m)
m = folium.Map(location=[40,0], zoom_start=2)
folium.Choropleth(
geo_data=geo,
data=df2,
columns=["Country Name", "Value"],
legend_name="Female Labor Force Participation Rate (%)",
key_on="feature.properties.NAME_EN",
fill_color="Blues",
).add_to(m)
addTooltip(m)
display(m)